Skip to content

Conversation

Skylion007
Copy link
Collaborator

@Skylion007 Skylion007 commented Nov 23, 2024

Update CUDA 12.6 to Update 3 and make cusparse-lt 0.6.3? #141365 Was going to leave some comments on #141365, but though it was just faster to open a PR here.

related to #138440

@Skylion007 Skylion007 requested review from a team and jeffdaily as code owners November 23, 2024 17:00
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Nov 23, 2024
Copy link

pytorch-bot bot commented Nov 23, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141433

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 17 Unrelated Failures

As of commit b61ff06 with merge base b75bb64 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Skylion007 Skylion007 requested review from atalman, eqy, jcaip, malfet, nWEIdia, ptrblck and tinglvv and removed request for jeffdaily and tinglvv November 23, 2024 17:00
@Skylion007 Skylion007 force-pushed the skylion007/update-cuda-12-6-3-libraries-2024-11-23 branch 2 times, most recently from 9e38b70 to 79fbb80 Compare November 23, 2024 17:07
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atalman I'll probably need this uploaded to the cu126 s3 bucket. Changes are from the https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-toolkit-major-component-versions list

@Skylion007 Skylion007 force-pushed the skylion007/update-cuda-12-6-3-libraries-2024-11-23 branch from 79fbb80 to 3097a63 Compare November 24, 2024 15:05
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 26, 2024
@Skylion007 Skylion007 force-pushed the skylion007/update-cuda-12-6-3-libraries-2024-11-23 branch from 44ab294 to 1afafa5 Compare November 26, 2024 15:59
Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Lets wait for green signal to land

Copy link
Collaborator

@tinglvv tinglvv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the pypi matrix packages, should be nvidia-nvjitlink-cu12==12.6.85 and nvidia-cublas-cu12==12.6.4.1 for 12.6.3

@Skylion007 Skylion007 force-pushed the skylion007/update-cuda-12-6-3-libraries-2024-11-23 branch from 9157523 to b61ff06 Compare November 27, 2024 15:47
@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Nov 27, 2024
@tinglvv
Copy link
Collaborator

tinglvv commented Nov 27, 2024

Adding the ciflow/binaries label to test the x86 nightly wheel.

@Skylion007
Copy link
Collaborator Author

@atalman Looks like we'll need those binaries uploaded to S3

@atalman
Copy link
Contributor

atalman commented Nov 27, 2024

@Skylion007 let me upload this now

@Skylion007
Copy link
Collaborator Author

Remaining failures appear unrelated.

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 28, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 23 checks: Build manywheel docker images for s390x / build-docker-cpu-s390x, windows-binary-wheel / wheel-py3_12-cpu-test, windows-binary-wheel / wheel-py3_9-xpu-test, windows-binary-wheel / wheel-py3_11-xpu-test, windows-binary-wheel / wheel-py3_13-xpu-test, windows-binary-wheel / wheel-py3_10-xpu-test, windows-binary-wheel / wheel-py3_12-xpu-test, macos-arm64-binary-libtorch-cxx11-abi / libtorch-cpu-shared-with-deps-cxx11-abi-build, linux-binary-manywheel / manywheel-py3_11-xpu-test, linux-binary-manywheel / manywheel-py3_10-rocm6_1-test, linux-binary-manywheel / manywheel-py3_11-rocm6_2-test, linux-binary-manywheel / manywheel-py3_10-rocm6_2-test, linux-binary-manywheel / manywheel-py3_13-xpu-test, linux-binary-manywheel / manywheel-py3_9-xpu-test, linux-binary-manywheel / manywheel-py3_9-rocm6_1-test, linux-binary-manywheel / manywheel-py3_12-rocm6_1-test, linux-binary-manywheel / manywheel-py3_12-rocm6_2-test, linux-binary-manywheel / manywheel-py3_12-xpu-test, linux-binary-manywheel / manywheel-py3_10-xpu-test, linux-binary-manywheel / manywheel-py3_11-rocm6_1-test, linux-binary-manywheel / manywheel-py3_9-rocm6_2-test, linux-binary-libtorch-pre-cxx11 / libtorch-rocm6_1-shared-with-deps-pre-cxx11-test, linux-binary-libtorch-pre-cxx11 / libtorch-rocm6_2-shared-with-deps-pre-cxx11-test

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

@atalman
Copy link
Contributor

atalman commented Nov 28, 2024

@pytorchmergebot merge -f "failures are not related"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
Update CUDA 12.6 to Update 3 and make cusparse-lt 0.6.3? pytorch#141365 Was going to leave some comments on pytorch#141365, but though it was just faster to open a PR here.
Pull Request resolved: pytorch#141433
Approved by: https://github.com/atalman
function install_cusparselt_063 {
# cuSparseLt license: https://docs.nvidia.com/cuda/cusparselt/license.html
mkdir tmp_cusparselt && pushd tmp_cusparselt
wget -q https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-x86_64/libcusparse_lt-linux-x86_64-0.6.3.2-archive.tar.xz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this, should have been libcusparse_lt-linux-sbsa-0.6.3.2. This is causing error /usr/local/cuda/lib64/libcusparseLt.so: error adding symbols: file in wrong format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants